Gaussian Process Dynamical Models for Phoneme Classification
نویسندگان
چکیده
Currently, the hidden Markov model (HMM) is the predominant model studied and used for speech recognition. There has been undeniable progress in speech recognition through the study of HMM but the huge gap that exists between user’s expectation and progress is also undeniable. There are essentially two limitation with the HMM: (1) The Markovian structure in HMM leads to limitation in what it can represent since the observations are conditionally independent given the states. The HMM can model only local dependency of speech by obtaining observations on a frame bases. However, the structure has been used since its computational benefit greatly outweighs its deficiency. (2) The state of the articulator represented as a discrete latent variables in HMM is another limitation that has been taken for granted with few reasons. Continuous states have been considered in the linear dynamic model (LDM) [2]. The limitations discussed above must be lifted for improving the speech recognition performance.
منابع مشابه
Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملPhoneme Classification using Constrained Variational Gaussian Process Dynamical System
For phoneme classification, this paper describes an acoustic model based on the variational Gaussian process dynamical system (VGPDS). The nonlinear and nonparametric acoustic model is adopted to overcome the limitations of classical hidden Markov models (HMMs) in modeling speech. The Gaussian process prior on the dynamics and emission functions respectively enable the complex dynamic structure...
متن کاملSpeaker Independent Phoneme Classification in Continuous Speech
This paper examines statistical models for phoneme classification. We compare the performance of our phoneme classification system using Gaussian mixture (GMM) phoneme models with systems using hidden Markov phoneme models (HMM). Measurements show that our model’s performance is comparable with HMM models in context independent phoneme classification.
متن کاملImproving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کاملSpeech Recognition Using Time Domain Features from Phase Space Reconstructions
A speech recognition system implements the task of automatically transcribing speech into text. As computer power has advanced and sophisticated tools have become available, there has been significant progress in this field. But a huge gap still exists between the performance of the Automatic Speech Recognition (ASR) systems and human listeners. In this thesis, a novel signal analysis technique...
متن کامل